Seaborn

Manuela da Cruz Chadreque

11 Dezember 2020



Distribution Plots

Let's discuss some plots that allow us to visualize the distribution of a data set. These plots are:

  • distplot
  • jointplot
  • pairplot
  • rugplot
  • kdeplot

Imports

In [18]:
import seaborn as sns
%matplotlib inline
import numpy as np

Data

Seaborn comes with built-in data sets!

In [2]:
tips = sns.load_dataset('tips')
In [3]:
tips.head()
Out[3]:
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4

distplot

The distplot shows the distribution of a univariate set of observations.

In [4]:
sns.distplot(tips['total_bill'])
Out[4]:
<matplotlib.axes._subplots.AxesSubplot at 0x2777e8d8898>
In [5]:
sns.distplot(tips['total_bill'], kde=False)
Out[5]:
<matplotlib.axes._subplots.AxesSubplot at 0x2777e9c8588>
In [7]:
sns.distplot(tips['total_bill'], kde=False, bins=30)
Out[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x2777faf5278>

jointplot

jointplot() allows you to basically match up two distplots for bivariate data. With your choice of what kind parameter to compare with:

  • “scatter”
  • “reg”
  • “resid”
  • “kde”
  • “hex”
In [8]:
sns.jointplot(x='total_bill',y='tip',data=tips,kind='scatter' )
Out[8]:
<seaborn.axisgrid.JointGrid at 0x2777fbfb518>
In [10]:
sns.jointplot(x='total_bill', y='tip', data=tips, kind='hex')
Out[10]:
<seaborn.axisgrid.JointGrid at 0x2777fd2f3c8>
In [11]:
sns.jointplot(x='total_bill', y='tip',data=tips,kind='reg' )
Out[11]:
<seaborn.axisgrid.JointGrid at 0x2777fe2ac50>

pairplot

pairplot will plot pairwise relationships across an entire dataframe (for the numerical columns) and supports a color hue argument (for categorical columns).

In [12]:
sns.pairplot(tips)
Out[12]:
<seaborn.axisgrid.PairGrid at 0x2777ff19e10>
In [13]:
sns.pairplot(tips, hue='sex', palette='coolwarm')
Out[13]:
<seaborn.axisgrid.PairGrid at 0x2771046e9e8>

rugplot

rugplots are actually a very simple concept, they just draw a dash mark for every point on a univariate distribution. They are the building block of a KDE plot:

In [15]:
sns.rugplot(tips['total_bill'])
Out[15]:
<matplotlib.axes._subplots.AxesSubplot at 0x27711fb7e80>

kdeplot

kdeplots are Kernel Density Estimation plots. These KDE plots replace every single observation with a Gaussian (Normal) distribution centered around that value. For example:

Categorical Data Plots

Now let's discuss using seaborn to plot categorical data! There are a few main plot types for this:

  • factorplot
  • boxplot
  • violinplot
  • stripplot
  • swarmplot
  • barplot
  • countplot

Let's go through examples of each!

barplot and countplot

These very similar plots allow you to get aggregate data off a categorical feature in your data. barplot is a general plot that allows you to aggregate the categorical data based off some function, by default the mean:

In [17]:
sns.barplot(x='sex', y='total_bill', data=tips)
Out[17]:
<matplotlib.axes._subplots.AxesSubplot at 0x277120241d0>
In [19]:
sns.barplot(x='sex',y='total_bill', data=tips, estimator=np.std)
Out[19]:
<matplotlib.axes._subplots.AxesSubplot at 0x277120744e0>

countplot

This is essentially the same as barplot except the estimator is explicitly counting the number of occurrences. Which is why we only pass the x value:

In [20]:
sns.countplot(x='sex', data=tips)
Out[20]:
<matplotlib.axes._subplots.AxesSubplot at 0x277120c1f28>

boxplot and violinplot

boxplots and violinplots are used to shown the distribution of categorical data. A box plot (or box-and-whisker plot) shows the distribution of quantitative data in a way that facilitates comparisons between variables or across levels of a categorical variable. The box shows the quartiles of the dataset while the whiskers extend to show the rest of the distribution, except for points that are determined to be “outliers” using a method that is a function of the inter-quartile range.

In [24]:
sns.boxplot(x='day', y='total_bill', data=tips, palette='rainbow')
Out[24]:
<matplotlib.axes._subplots.AxesSubplot at 0x27712290d30>
In [23]:
# Can do entire dataframe with orient='h'
sns.boxplot(data=tips,palette='rainbow', orient='h')
Out[23]:
<matplotlib.axes._subplots.AxesSubplot at 0x2771221f048>
In [26]:
sns.boxplot(x='day', y='total_bill',hue='sex', data=tips, palette='coolwarm')
Out[26]:
<matplotlib.axes._subplots.AxesSubplot at 0x27712285898>

violinplot

A violin plot plays a similar role as a box and whisker plot. It shows the distribution of quantitative data across several levels of one (or more) categorical variables such that those distributions can be compared. Unlike a box plot, in which all of the plot components correspond to actual datapoints, the violin plot features a kernel density estimation of the underlying distribution.

In [27]:
sns.violinplot(x='day', y='total_bill', data=tips, palette='rainbow')
Out[27]:
<matplotlib.axes._subplots.AxesSubplot at 0x277124940b8>
In [28]:
sns.violinplot(x='day', y='total_bill',hue='sex',data=tips,palette='Set1')
Out[28]:
<matplotlib.axes._subplots.AxesSubplot at 0x2771250d358>
In [32]:
sns.violinplot(x='day',y='total_bill',hue='sex',data=tips,palette='Set1',split=True)
Out[32]:
<matplotlib.axes._subplots.AxesSubplot at 0x27712645b00>

stripplot and swarmplot

The stripplot will draw a scatterplot where one variable is categorical. A strip plot can be drawn on its own, but it is also a good complement to a box or violin plot in cases where you want to show all observations along with some representation of the underlying distribution.

The swarmplot is similar to stripplot(), but the points are adjusted (only along the categorical axis) so that they don’t overlap. This gives a better representation of the distribution of values, although it does not scale as well to large numbers of observations (both in terms of the ability to show all the points and in terms of the computation needed to arrange them).

In [34]:
sns.stripplot(x='day', y='total_bill', data=tips)
Out[34]:
<matplotlib.axes._subplots.AxesSubplot at 0x277126c4c88>
In [35]:
sns.stripplot(x='day', y='total_bill', data=tips, jitter=True)
Out[35]:
<matplotlib.axes._subplots.AxesSubplot at 0x2771272b358>
In [37]:
sns.stripplot(x='day', y='total_bill', data=tips, hue='sex',jitter=True, palette='Set1')
Out[37]:
<matplotlib.axes._subplots.AxesSubplot at 0x2771276d898>
In [39]:
sns.stripplot(x='day', y='total_bill', data=tips, hue='sex', jitter=True, palette='Set1', split=True)
C:\Users\maiam\Anaconda3\lib\site-packages\seaborn\categorical.py:2775: UserWarning: The `split` parameter has been renamed to `dodge`.
  warnings.warn(msg, UserWarning)
Out[39]:
<matplotlib.axes._subplots.AxesSubplot at 0x277127b5470>
In [40]:
sns.swarmplot(x='day', y='total_bill', data=tips)
Out[40]:
<matplotlib.axes._subplots.AxesSubplot at 0x2771287b080>
In [41]:
sns.swarmplot(x='day', y='total_bill', data=tips, hue='sex', palette='Set1', split=True)
C:\Users\maiam\Anaconda3\lib\site-packages\seaborn\categorical.py:2974: UserWarning: The `split` parameter has been renamed to `dodge`.
  warnings.warn(msg, UserWarning)
Out[41]:
<matplotlib.axes._subplots.AxesSubplot at 0x27713890dd8>

Combining Categorical Plots

In [47]:
sns.violinplot(x='tip', y='day', data=tips, palette='rainbow')
sns.swarmplot(x="tip", y='day', data=tips, color='black',size=3)
Out[47]:
<matplotlib.axes._subplots.AxesSubplot at 0x27713b61668>

factorplot

factorplot is the most general form of a categorical plot. It can take in a kind parameter to adjust the plot type:

In [49]:
sns.factorplot(x='sex', y='total_bill', data=tips, kind='bar')
Out[49]:
<seaborn.axisgrid.FacetGrid at 0x27713c19dd8>
In [ ]: